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AH  ALGORITHM  FOR  THE  UNIVARIATE  ANALYSIS  OF  VARIANCE 
IN  EXPERIMENTS  WITH  REPEATED  MEASURES 


INTRODUCTION 

In  a  typical  repeated  measurements  experiment,  subjects  are  randomly 
assigned  to  different  treatment  groups,  and  observations  are  made  on  the  sub¬ 
jects  at  distinct  points  in  time  in  order  to  estimate  and  test  for  the  treat¬ 
ment  effects.  Under  conditions  given  by  Huynh  [5],  a  univariate  analysis  of 
variance  is  appropriate  for  this  purpose.  If  the  treatment  groups  are  of  equal 
size  and  every  subject  has  a  data  value  at  each  time,  the  analysis  can  be  read¬ 
ily  obtained  from  packaged  computer  programs  such  as  SAS  ANOVA,  SAS  GLM, 
BMDP2V,  BMDP4V,  or  BMDP8V  [1,4,9].  One  frequently  encounters  experimental  sit¬ 
uations  in  practice,  however,  where  the  cell  sizes  are  unequal  or  where  some  of 
the  subjects  have  incomplete  data.  The  SAS  ANOVA  and  BMDP8V  programs  were  not 
designed  for  use  in  such  cases,  and  there  are  situations  where  the  other  three 
programs  all  suffer  from  serious  drawbacks,  some  of  which  have  been  discussed 
by  Morris  et  al.  [8]. 

For  repeated  measurements  experiments  with  missing  data,  SAS  GLM  does  not 
include  any  provision  for  computing  least-squares  means  which  are  averaged 
across  treatment  groups  of  unequal  size.  The  least-squares  means  for  the 
highest  order  interactions  can  be  computed  provided  an  explicit  factor  for 
"between  subjects  within  treatment  groups"  is  included  in  the  model.  The 
computer  core  storage  and  central  processing  unit  (CPU)  time  used  grow  rapidly 
as  a  function  of  the  order  of  the  X ' X  matrix,  which  can  cause  the  analysis  to 
become  very  expensive  if  the  number  of  subjects  is  large,  as  demonstrated  by 
the  following.  These  examples  were  run  using  SAS  GLM  on  an  IBM  4341  Model  2 


under  a  multiple  virtual  storage 
are  the  number  of  subjects  in  the 
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1 
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AxB 

2 

2 
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Between  subjects  (AxB) 

27 

77 

127 

Time 

3 

3 
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AxTime 

3 

3 
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BxTime 

6 

6 

6 

AxBxTime 

6 

6 

6 

Error 

81 

231 

381 

Core  required  (K  bytes) 

256 

354 

504 

CPU  time  (min) 

.30 

1.62 

5.04 

If  any  subject  has  incomplete  data,  neither  BMDP2V  nor  BMDP4V  provides  a 
method  for  computing  properly  adjusted  least-squares  means.  Adjusted  F-ratlos 
can  be  easily  obtained  provided  all  treatment  groups  are  of  equal  size  by 


including  "between  subjects  within  treatment  groups"  as  a  factor  and  analyzing 
the  data  as  if  it  were  a  full  factorial  experiment  [2].  If  the  number  of  sub¬ 
jects  is  large,  however,  we  have  the  same  problems  discussed  for  SAS  GLM.  For 
situations  where  missing  values  are  present  and  the  cell  sizes  are  unbalanced, 
the  only  way  we  have  found  to  adjust  the  F-ratios  in  either  BMDP  program  is  to 
create  dummy  covariates.  This  can  be  a  laborious  exercise  or  a  tricky  program¬ 
ming  problem,  and  it  does  not  make  these  two  programs  generate  the  desired  sum¬ 
mary  means  because  the  adjusted  means  are  computed  as  if  the  dumniy  covariates 
were  true  covariates. 

This  report  describes  a  computing  procedure  for  the  univariate  analysis  of 
a  repeated  measurements  experiment  where  the  subjects  are  arranged  in  a  two-way 
classification  with  cell  frequencies  that  can  be  disproportionate.  The  analy¬ 
sis  is  adjusted  for  missing  values,  provided  their  number  and  configuration  do 
not  violate  certain  limitations.  The  method  proceeds  in  such  a  way  tha-;  the 
order  of  the  X'X  matrix  does  not  depend  on  the  number  of  subjects  in  the  exper¬ 
iment  (although  it  does  depend  on  the  number  of  missing  values).  The  algorithm 
has  been  incorporated  into  a  SAS  procedure,  REP2W1F,  which  produces  useful  sum¬ 
mary  statistics  (including  least-squares  means)  particular  to  the  design.  The 
approach  could  be  generalized  to  experiments  where  the  number  of  treatment 
factors  is  other  than  two  and  where  the  repeated  measures  have  a  factorial 
arrangement  of  their  own. 


EXPERIMENTAL  DESIGN  AND  MODEL  EQUATION 

In  the  particular  situation  we  will  consider  here,  N  subjects  are  randomly 
assigned  to  treatment  groups  which  are  arranged  in  a  two-way  classification 
with  factors  A  (having  levels  Aj,  i=l,2,...a)  and  B  (having  levels  B,. 
j=l,2,...b).  There  are  n^j  (>0)  subjects  at  the  ith  level  of  A  and  jf" 
level  of  B,  designated  by  tne  subscript  *=1,2, ...n^ j.  Each  subject  is  mea¬ 
sured  at  t  levels  of  a  fixed  Time  factor  (T^,  k=l,2,...t).  Tests  of  the  main 
effects  A  and  B  and  of  the  interaction  AxB  are  generally  referred  to  as  the 
between-subject  portion  of  the  analysis,  whereas  tests  for  a  main  effect  due  to 
Time  or  an  interaction  involving  Time  are  referred  to  as  belonging  to  the 
within-subject  part  of  the  analysis.  The  model  equation  for  the  case  of  no 
missing  values  appears  as  follows: 


Yijk*  uijk  +  eijk*’ 

where  uijk  =  u  +  Ai  +  B.  +  (AB)^  +  Tk  +  (AT)ik  +  (BT)jk  +  (ABT)i Jk ;  y,  A.,  Bjf 
( AB ) i  j »  ’  etc*«  are  t’Te  usual  fixed  main  effects  and  interactions;  and  e.jjk£ 
is  a  random  error  term. 


We  assume  that  the  components  of  model  equation  1  meet  a  number  of  condi¬ 
tions.  The  fixed  main  effects  and  interactions  are  assumed  to  follow  the  so- 
called  E-restrictions  [12].  These  are  the  same  as  the  "usual  constraints"  in 
which  the  sum  over  any  subscript  (i,  j,  k,  or  *)  of  any  set  of  fixed  main 
effects  or  interactions  containing  that  subscript  is  0.  For  example,  EA^  *  0 


2 


and  E(AB)ij  =  E(AB)jj  =  0.  Provision  for  weighted  restrictions  (such  as 

=  0)  coulil  also  have  been  made,  although  in  the  absence  of  substantial 

prior  knowledge  about  a  given  set  of  data,  the  E-restrictions  are  usually  rea¬ 
sonable. 

The  remaining  assumptions  involve  the  errors.  If  •,  represents  a 
t-vector  of  errors  for  the  subject  in  treatment  groupJn(i  ,j),  then  we 
assume  has  been  randomly  drawn  from  a  multivariate  normal  population 

with  meanJvector  0  and  covariance  matrix  E^j.  For  the  between-subject  por¬ 
tion  of  our  analysis  to  be  valid,  we  assume  in  addition  that  the  variance  of 
the  sum  over  time,  Ee.j^,  is  the  same  for  all  subjects  in  the  experiment.  The 
within-subject  portion  of  the  analysis  is  valid  if  and  only  if  the  errors  have 
the  property  of  multisample  sphericity  [5],  which  means  that  if  C  is  a 
(t-l)x(t)  orthogonal  contrast  matrix,  then  there  is  a  constant  \  for  which  the 
equation  CEjjC'  =  Mt-1  holds  independently  of  i  and  j.  (It-1  is  the  identity 
matrix  of  order  t-1). 

It  frequently  occurs  that  some  of  the  subjects  do  not  have  complete  data, 
a  circumstance  that  complicates  the  analysis.  The  analysis  described  here 
assumes  that  the  pattern  of  missing  values  is  random  and  independent  of  any 
treatment  effects.  At  present,  there  is  no  universally  accepted  method  for 
handling  all  parts  of  the  repeated  measurements  analysis  of  variance  when  miss¬ 
ing  values  occur  under  this  condition  [2,3].  To  compute  the  between-subject 
portion  of  the  analysis,  we  require  that  every  treatment  combination  (i , j )  has 
at  least  one  subject  with  complete  data,  and  that  at  least  one  such  treatment 
combination  has  two  or  more  subjects  with  complete  data.  If  we  denote  by  r-jj 
the  number  of  subjects  with  incomplete  data  in  the  ( i ,  j )  cell,  this  says  that 
(njj-1)  -  rij  is  nonnegative  for  every  cell  and  positive  for  at  least  one 
cell.  If  this  condition  is  met,  the  algorithm  produces  the  between-subject 
analysis  we  would  have  if  subjects  with  missing  values  were  completely 
excluded.  We  have  programmed  REP2W1F  so  that  if  the  condition  is  not  met,  a 
message  is  printed  informing  the  user  that  the  between-subject  analysis  cannot 
be  computed,  and  the  program  continues  with  the  within-subject  analysis. 

The  within-subject  portion  of  the  analysis  is  obtained  using  dummy  covari¬ 
ates  to  adjust  the  tests  for  missing  values.  The  resulting  within-subject 
analysis  is  equivalent  to  that  suggested  by  Schwertman  [10].  He  proved  that 
for  repeated  measurements  experiments  with  missing  values,  where  the  required 
assumptions  for  a  univariate  analysis  are  otherwise  met,  F-tests  of  null 
hypotheses  involving  the  within-subject  parameters  can  be  constructed  using 
ordinary  general  linear  regression  techniques  if  explicit  effects  for  subjects 
within  treatment  groups  ( S -j j a )  are  added  to  model  equation  1.  The  sum  of 
squares  for  each  hypotheses  h0  is  calculated  by  taking  the  difference  between 
the  residual  sum  of  squares  for  the  reduced  model  under  Hq  and  for  the  full 
model.  The  difficulty  with  implementing  Schwertman's  idea  directly  is  that  as 
the  number  of  subjects  increases,  the  order  of  the  X ' X  matrix  to  be  stored  and 
inverted  also  increases,  leading  to  core  storage  and  CPU  time  requirements 
that  can  become  very  expensive,  or  even  impossible,  to  meet.  The  algorithm 
presented  here  avoids  the  use  of  explicit  subject  effects  and  the  resulting 
problem. 


If  the  total  number  of  missing  values  in  the  experiment  is  m  >  0,  our 
technique  generates  for  each  one  a  dumn\y  covariate  Zyjji^  (v=l,2,...m)  which 
equals  -1  if  observation  (i,j,k,t)  is  the  vth  missing  value,  and  0  other¬ 
wise.  Model  equation  1  is  then  altered  as  follows: 

Yijk*  "  Mijk  +  ^v^^vijkt^  +  Gi jk £*  ^ 


where  pjjk  Is  the  same  as  in  equation  1;  Y-jj^  =  0  if  the  observation  is 
missing;  zvi jk £.  is  dummy  covariate  and  yv  the  regression  coefficient 
associated  with  the  v**1  missing  observation  (v=l,2,...m). 

It  should  be  noted  that  our  assumptions  concerning  multisample  sphericity 
of  the  error  term  are  not  altered  in  equation  2  by  the  presence  of  missing 
data.  If  Yjj^  =  0  represents  the  v^  missing  value,  then  we  have 


0  =  ^i  jk  + 


+  e.  . 
ijkt 


) +  ei« 


where  (»ijk  *  etjkl>  a  \  and  (-1)  ■  Zv1jkt  in  2- 

There  are  limitations  on  the  number  and  configuration  of  missing  observa¬ 
tions  we  can  have  for  any  treatment  combination  (i , j )  and  still  be  able  to  com¬ 
pute  the  within-subject  analysis.  Within  each  cell,  when  all  subjects  in  that 
cell  have  complete  data,  there  are  (njj-l)(t-l)  degrees  of  freedom  for  the 
residual  sum  of  squares.  Each  missing  observation  in  the  cell  costs  one  degree 

of  freedom,  so  that  if  there  are  m^j  missing  values  in  the  (i,j)  cell,  the 

number  of  degrees  of  freedom  the  cell  contributes  to  the  residual  sum  of 

squares  is  (njj-l)(t-l)  -  m-jj.  Our  algorithm  requires  that  this  quantity 
be  nonnegative  within  every  cell  and  that  it  be  positive  for  at  least  one 
cell.  In  addition,  certain  configurations  of  missing  values  violate  our 
requirements  for  the  within-subject  analysis.  Two  examples  would  be  missing 
data  for  1)  all  times  on  a  given  subject  or  2)  all  subjects  within  a  cell  for  a 
particular  time.  Either  too  many  missing  values  or  the  wrong  configuration  of 
missing  values  will  result  in  our  trying  to  invert  a  singular  matrix.  The 

matrix  inversion  routines  in  REP2W1F  recognize  singularities,  and  when  one  is 
found,  the  program  immediately  prints  a  message  to  indicate  what  has  happened 
and  terminates  the  analysis. 

When  both  the  between-  and  within-subject  analyses  can  be  computed, 
REP2W1F  produces  an  analysis  of  variance  table  which  includes  all  the  source 
lines  shown  in  Table  1.  The  use  of  dummy  covariates  to  adjust  source  lines  in 
the  table  for  missing  values  has  previously  been  recommended  by  Jarrett  [7] 
based  on  the  ideas  of  Wilkinson  [13],  The  approach  as  they  described  it 
involves  building  and  testing  effects  in  a  hierarchical  manner  instead  of 
making  use  of  the  E-restrictions  to  test  main  effects  and  first-order  interac¬ 
tions  adjusted  for  all  other  effects  in  the  model.  As  a  result,  their  tests 
for  effects  due  to  A,  B,  Time,  AxTime,  and  BxTime  would  differ  from  those  to  be 
presented  here. 
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TABLE  1.  SOURCE  LINES  AND  DEGREES  OF  FREEDOM  FOR  THE  ANALYSIS 
OF  VARIANCE  BASED  ON  MODEL  EQUATIONS  1  AND  2 


Between  subjects 


Within  subjects 


Source 

Degrees  of 
freedom 

A 

a-1 

B 

b-1 

AxB 

(a-1 )  (b-1 ) 

Error(a) 

lpij-l>  "  rij-* 

Time 

t-1 

AxTime 

(a-1 ) (t-1 ) 

BxT i me 

(b-1 ) (t-1 ) 

AxBxTime 

(a-1 ) (b-1 ) (t-1 ) 

Error(b) 

HCCij-iKt-n  -  -Hj] 

THE  MODEL  IN  MATRIX  NOTATION 

We  now  introduce  the  matrix  notation  we  use  in  describing  the  computing 
algorithm,  which  will  make  it  possible  to  present  the  derivations  in  a  more 
compact  form.  Starting  with  equations  1  and  2,  let  _Y  =  (Yijkt)  denote  the 
vector  of  observations  (including  zeros  for  missing  values),  sorted  so  that  k 
is  the  fastest  moving  index,  t,  the  next  fastest,  and  i  the  slowest.  Similarly, 
define  the  error  vector  £  =  (e-jji^)  so  that  it  conforms  with 

In  writing  the  parameter  vector  £,  we  eliminate  certain  of  the  main 
effects  and  interactions  which  are  redundant  due  to  the  E-restrictions.  For 
the  main  effects,  this  is  accomplished  by  arbitrarily  selecting  the  effect  with 
the  largest  subscript  for  deletion,  and  since,  for  example,  Aa  = 
-(Ai+A2+...Aa_i),  these  deleted  effects  are  not  needed  in  the  calculations  to 
follow.  Effects  for  a-1  levels  of  factor  A,  b-1  levels  for  factor  B,  and  t-1 
levels  of  Time  will  thus  be  included  in  and  any  interaction  effect  which 
involves  the  ath  level  of  A,  the  b^*1  level  of  B,  or  the  tth  level  of  Time 
can  be  deleted. 

It  is  convenient  to  write  £  as  a  partitioned  vector  with  js*  = 
where  contains  the  between-subject  parameters,  _g2  contains  the  within- 
subject  parameters,  and  £3  =  (yv)  contains  the  regression  coefficients 

associated  with  the  missing  values.  A  total  of  (a ) ( b )  elements  are  contained 
in  j?lf  including  (in  order)  u.  a-1  main  effects  from  factor  A,  b-1  main  effects 
from  factor  B,  and  (a-1) (b-1)  first-order  interaction  effects  from  AxB.  There 
are  (a)(b)(t-l)  elements  in  j32,  the  first  t-1  of  which  are  the  main  effects  for 
Time,  followed  by  (a-l)(t-l)  effects  for  AxTime,  (b-1 ) (t-1 )  effects  for  BxTime, 
and  (a-1) (b-1) (t-1)  effects  for  AxBxTime.  In  _e3  are  m  elements,  the  number  of 
missing  observations.  Shown  in  equation  4  for  the  case  of  a=2,  b*3,  and  t=2 
are  and  3^  • 


5 


£i  *  (u  Bi  B2  (AB) i i  (AB)12) 


and 

B2  =  (T i  (AT)n  (BT)n  (BT)21  (ABT)1U  (ABT)m).  (4) 


We  also  use  the  notation  for  a  t-vector  of  l's;  1^  for  an  identity 
matrix  of  order  t;  and  for  the  (t)x(t-l)  partitioned  matrix  defined  as 


(5) 


The  Kronecker  product  operator  §  is  defined  in  the  following  illustration.  If 
U  =  (u-j j )  is  a  (2)x(3)  matrix  and  V  is  a  matrix,  then  the  Kronecker  product 
of  U  and  V  is  the  partitioned  matrix 


U  8  V 


UnV  u^2V  U13V 

u2iV  u22v  u23v 

v.  y 


(6) 


Equation  2  can  now  be  rewritten  in  matrix  notation  as 


Y  *  (D8Jt  D8Kt  Z) 


£2 

£3 


+  e , 


(7) 


where  0  is  a  matrix  discussed  in  the  following  paragraph,  and  Z  =  Uvijkt) 
the  matrix  of  dummy  covariates  (needed  only  if  missing  values  occur). 

The  D  matrix  is  the  design  matrix  we  would  associate  with  j*i  for  an 
experiment  with  only  one  reading  per  subject  (that  is,  no  repeated  measures). 
It  has  one  row  for  each  subject  in  the  experiment  and  one  column  for  each  of 
the  (a)(b)  elements  of  8.1.  This  means  that  D  does  not  change  as  the  number  of 
repeated  measures  increases.  Our  requirement  that  (n-j j-1 ) (t-1 )  -  m^j  be 
nonnegative  within  every  treatment  combination  (i,j)  means  that  n,j  >  0  and  D 
is  of  full  column  rank.  An  example  of  the  D  matrix  for  the  case  of  a=2,  b=3, 
and  nfj=2  for  all  cells  Is  given  in  Table  2. 

Because  any  pair  of  error  elements  from  distinct  subjects  are  independent, 
the  variance  of  the  vector  £  is  block  diagonal.  The  matrices  down  the  diagonal 
we  denote  by  E^jj  (t=l,2,...nj j),  where  E^  is  the  covariance  matrix  for 
the  errors  of  the  subject  in  the  cell  (i,j).  Huynh  and  Feldt  [6]  showed 
that  because  of  multisample  sphericity,  each  Ejj*  may  be  written  as: 


'ijt 


(« 


utJt>  * 


(J 


-i  i  t. 


)  * 


(8) 
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where  o  —  ^  is  a  t-vector  which  may  change  from  subject  to  subject,  and  X  is 
a  constant  which  is  the  same  across  all  subjects. 


TABLE  2.  EXAMPLE  OF  THE  MATRIX  DENOTED  BY  D  IN  EQUATION  7  FOR 


THE 

CASE  OF  a= 

=2, 

b=3,  AND 

"ij=2 

FOR 

ALL  CELLS 

Subscripts 

of  Y 

Elements 

of  D 

i 

j 

* 

01  = 

u 

Ai 

Bi 

b2 

(AB)n 

(AB)i2 

1 

1 

1 

I 

1 

1 

0 

1 

0 

1 

1 

2 

1 

1 

1 

0 

1 

0 

1 

2 

1 

1 

,  1 

0 

1 

0 

1 

1 

2 

2 

1 

1 

0 

1 

0 

1 

1 

3 

1 

1 

1 

-1 

-1 

-1 

-1 

1 

3 

2 

1 

1 

-1 

-1 

-1 

-1 

2 

1 

1 

1 

-1 

1 

0 

-1 

0 

2 

1 

2 

1 

-1 

1 

0 

-1 

0 

2 

2 

1 

1 

-1 

0 

1 

0 

-1 

2 

2 

2 

1 

-1 

0 

i 

0 

-1 

2 

3 

1 

1 

-1 

-1 

-l 

1 

1 

2 

3 

2 

1 

-1 

-1 

-l 

1 

1 

COMPUTATIONAL  ALGORITHM 


Between  Subjects 

If  no  missing  values  occur  in  the  data  to  be  analyzed,  we  compute  the  aver¬ 
age  value  across  Time  for  each  subject,  using  the  notation  Y-jj.^  =  i.  EY-jj^. 
When  these  values  are  considered  in  terms  of  the  E-restrictions*  &n  the 
within-subject  parameters,  we  obtain  equation  9: 


Ai 


bj 


G  . 


ij’t* 


(9) 


In  our  matrix  notation,  if  there  were  N  subjects  in  the  experiment,  equation  9 
could  be  rewritten  as: 


Since  J£Kt  =  0,  equation  10  reduces  to 


(11) 


-(iN  •  j;)i  =  -(o  *  j;jt)ii 

t  c  t  ^ 


Equation  11  has  the  familiar  general  form  +  £  for  linear  models.  The 

assumptions  on  the  variances  of  the  errors,  namely  independence  between  sub¬ 
jects  and  homogeneity  of  across  subjects,  make  the  between-subject 

analysis  straightforward  using  well-known  methods  for  obtaining  tests  of  main 
effects  and  interactions,  all  mutually  adjusted  for  each  other  [12]. 

Useful  summary  statistics  printed  by  REP2W1F  for  the  between-subject  anal¬ 
ysis  include  the  table  of  AxB  means  (the  arithmetic  means  within  each  (i,j) 
cell)  and  the  marginal  means  for  A  and  B  (the  unweighted  averages  of  these  cell 
means ) . 

If  missing  values  do  occur  for  some  subjects,  an  appropriate  between- 
subject  analysis  can  still  be  computed  by  limiting  this  part  of  the  analysis  to 
only  those  subjects  with  complete  data,  which  REP2W1F  does,  provided  every 
(i,j)  cell  has  at  least  one  subject  with  complete  data,  and  at  least  one  cell 
has  two  or  more  subjects  with  complete  data.  When  these  last  two  conditions 
are  not  met,  REP2W1F  prints  a  message  to  this  effect  and  moves  on  to  the 
within-subject  analysis.  If  we  use  BMDP2V  or  BMDP4V  with  dummy  covariates  to 
compute  the  entire  analysis  in  a  single  call  of  the  program,  the  between- 
subject  portion  of  the  analysis  is  equivalent  to  that  just  described.  A  gen¬ 
eralized  least-squares  [11]  analysis,  making  use  of  available  data  from  sub¬ 
jects  with  values  missing  and  in  which  tests  of  the  between-subject  parameters 
are  adjusted  for  the  within-subject  factors,  would  require  additional  knowledge 
of,  or  assumptions  about,  the  covariance  matrix  of  the  error  vector  e_. 


Within  Subjects 

Consider  a  transformation  of  the  Y  values  in  which  the  last  observation 
for  each  of  the  N  subjects  is  subtracted  from  each  of  the  first  t-1  levels,  and 
the  last  value  then  dropped.  This  is  equivalent  to  multiplying  each  side  of 
equation  7  by  the  matrix  Ifl  8  K^.,  giving: 


(IN  «  ^)Y  =  (IN  8  K^)(D  8  Jt | D  8  Kt|Z) 


£i 

£2 


£3 


+  (!n 


(12) 


This  transformation  leads  to  two  simplifications:  first,  (In  8  KM(D  8  J^)  = 
(In  D)8(K^Jt),  but  K'J|-  =  _0,  which  means  that  the  coefficients  multiplying  the 
elements  of  B_i  in1  equation  12  are  all  0's;  second,  (In  8  K')(D  8  K^)  = 
0  8  K^K(-.  Equation  12  therefore  reduces  to  ' 


V)Z) 


£2 

£3 


+ 


(13) 
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Now  we  introduce  the  following  notation: 


and 


1*  =  (IN  »  K')Y, 
Z*  =  (IN  *  K')Z, 
£*  -  (IN  »  K-)e, 


(14) 


If  equation  13  is  rewritten  in  this  new  notation,  the  result  is 


Y*  =  (D  8  V  1*) 
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+  e*. 


(15) 


The  form  of  equation  15  is  the  familiar  linear  model. 

Now  we  need  to  find  the  variance  of  e*.  We  have  Var(e*)  = 
(In  8  K')Var(e)(lN  8  KjJ ,  where  Var(e)  is  block  diagonal  with  the  matrices  E-§ j £ 
from  equation  8  down  the  diagonal.  Var(e*)  is  also  block  diagonal,  with  tne 
matrices  K^E-j j fcKt  on  the  diagonal.  Finally,  using  equation  8  we  have 


K '  E 
t 


i  j*Kt 


♦  »t)Kt  •  ■  »2v- 


(16) 


since  J^K<-  =  O'  and  K^Jt  =  0. 


Equations  14,  15,  and  16  combine  to  give  equation  17. 
Var(e*)  =  o2(IN  8  V), 


(17) 


where  V  is  known  and  a2  is  unknown.  Because  of  equations  15  and  17,  the  normal 
equations  to  be  solved  in  obtaining  the  generalized  least-squares  estimates 
[11]  of  82  and  b_3  are  as  follows: 


(18) 


D'D  8  V 

(D*  8  It_l)Z*  ' 

A 

£2 

D'  8  It_l  ' 

Y* 

z*'(0  a  it_i) 

Z*'(IN  8  V"l)Z* 

J 

A 

£3 

Z*‘(IN  8  V-1) 

You  can  reduce  computational  effort  by  observing  that  Y*  and  Z*  can  be 
partitioned  as  shown  in  equation  19,  with  each  pair  Qf*,Z*)  associated  with 
one  subject. 


Y* 
— 1 

z*  = 

z* 

1 

Yi 

z* 

-2 

2 

• 

UJ 

k 

V  / 

This  produces  normal  equations  given  by 


D'D  8  V 

(O’  a  It_j)z*' 

«£| 

V _ 

(O'  •  It.!)!* 

Z*'(D  «  >t.i) 

z(z;’v-‘z;) 

J 

r 

|T30> 

»  - 

(19) 


(20) 


In  equation  20  the  only  matrices  which  must  be  stored  and  whose  orders  depend 
on  N  are  Y*  and  Z*;  the  normal  equations  shown  are  in  the  form  we  use  for 
computational  purposes  in  REP2W1F.  Conceptually,  however,  it  is  helpful  in  the 
remaining  discussion  to  back  up  to  equation  18  and  rewrite  it  as 


WO  =  UY* 


(21) 


where 


W  = 


o  = 


D'D  8  V 

(O’  a  It_1)z* 

Z*' (D  8  It.j) 

Z*'(IN  a  v_1 )z* 

j 

£2 

A 

£3 

V  J 


and 


U  = 


o'  «  It-! 
Z*'(lN  »  V-1) 
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The  within-subject  parameter  estimates  can  be  computed  as  in  equation  22 
provided  W  is  nonsingular. 

8  =  W_1UY*.  (22) 


The  order  of  the  matrix  W  is  (a ) (b) (t-1 )  +  m,  which  does  not  involve  N.  This 

is  where  the  savings  in  computer  core  storage  and  CPU  processing-tine 

requirements  are  achieved  by  REP2W1F  relative  to  SAS  GLM.  If  m  =  0  (meaning 
there  are  no  missing  values),  then  W  =  D'D  8  V  can  be  shown  to  be  a  nonsingular 
matrix  under  the  condition  we  have  Imposed  that  (n-, j-1 ) (t-1 )  be  nonnegative 
within  every  cell  (i,j)  and  positive  for  at  least  one  cell.  Under  this 

condition,  the  rank  of  D'D,  which  is  (a)(b),  equals  the  order  of  D'D,  so  that 

D'D  is  nonsingular.  Additionally,  V  is  nonsingular.  Since  the  eigenvalues  of 
D'D  8  V  are  equal  to  6rv$,  where  5r  is  the  r^  eigenvalue  of  D'D  and  vs  Is  the 

sth  eigenvalue  of  V,  D'D  8  V  has  (a ) (b) (t-1 )  nonzero  eigenvalues  and  is  thus 

nonsingular. 

If  m  >  0,  the  possibility  exists  that  W  might  be  singular.  This  will 

happen  if  (n-j j-1 ) (t-1 )  -  my  <  0  for  one  of  the  cells  (i,j).  It  can  also 

happen  for  certain  configurations  of  missing  values  such  as  all  times  missing 
for  a  given  subject  or  all  subjects  missing  within  a  cell  for  a  particular 
time.  The  matrix  inversion  routine  in  REP2W1F  uses  Cholesky  decomposition  in 
finding  W"1.  When  the  lower  triangular  matrix  T  such  that  W  *  TT'  has  been 
found,  the  elements  along  the  diagonal  of  T  a re_  compared.  If  the  ratio  of  the 
smallest  to  largest  elements  is  less  than  10"5,  the  matrix  is  declared  to 
be  singular.  In  such  cases  a  message  is  printed  and  the  particular  analysis 
terminate-.. 

Provided  W  is  nonsingular,  the  quantitites  needed  to  test  linear  hypothe¬ 
ses  of  the  form  C£  «  £  are  straightforward  [11,  pp.  110-112]. 

SSE  =I*'(In  8  V" 1  )JT*  -  (UY^'W^UY*)  (23) 


with  degrees  of  freedom  (N-(a)(b))(t-l)  -  m, 

o2  =  SSE  /  {(N-(a)(b))(t-l)  -  m},  (24) 

Q  =  estimated  variance  of  9  =  o2(W_1U(In  8  V)U'W_i),  (25) 

SSH(H0:  C9  =  0)  =  (Ce) ' (CQC ' )-1(C©).  (26) 

where  C  is  required  to  have  full  row  rank  equal  to  h,  which  is  also  the  degrees 
of  freedom  for  SSH.  REP2W1F  employs  efficient  algorithms  equivalent  to  these 
formulas  to  calculate  the  sums  of  squares  for  lines  in  the  within-subject 
portion  of  Table  1.  To  illustrate  In  more  detail  what  is  being  computed, 
suppose  a=2,  b=3,  t=2,  and  m=2,  so  that 
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(27) 


£2  *  (Ti  (AT)n  (BT)U  (BT)2l  (ABT)m  (ABT)121) 


and 


£3  =  (yi  y2). 


The  adjusted  hypothesis  sums  of  squares  to  test  for  an  overall  main  effect  due 
to  Time  and  for  interactions  involving  Time  are  shown  in  equation  26  with  C 
given  as: 


and 


Time: 
AxT ime: 
B'jcT  fine: 


c  =  (  1  0  0  0  0  0  0  0  ), 


c  =  (  0  1  0  0  0  0  0  0  ), 


■e 


o  ei  00000 
00010000 


AxBxTime: 


C  = 


ooooioool. 

nooooiooj 


(28) 


The  sum  of  squares  for  Error (b)  is  SSE  In  equation  23. 

All  of  these  sums  of  squares  are  the  same  as  the  Type  III  and  Type  IV  sums 
of  squares  from  SAS  GLM  when  "subjects  within  treatment  groups"  is  Included  as 
a  term  in  the  model  for  that  program.  These  are  also  the  same  as  the  sums  of 
squares  given  by  BM0P2V  and  BMDP4V  when  dumriy  covariates  are  used  in  these 
programs  to  adjust  for  missing  values. 

Adjusted  means  for  the  within-subject  portion  of  the  analysis  are  obtained 

after  inserting  the  estimates  of  the  missing  values  (frcmj^)  back  Into  their 
respective  locations  in  the  original  data  vector  _Y.  The  "AxBxTime  means  are 
computed  from  the  reconstituted  Y  vector.  These  are  the  same  as  the  AxBxTime 
least-squares  means  produced  by  3AS  GLM.  The  marginal  means  for  Time,  AxTime, 
and  BxTime  in  REP2W1F  are  the  unweighted  averages  of  these  adjusted  AxBxTime 
means.  The  SAS  GLM  procedure  can  be  manipulated  to  give  these  means,  provided 
we  do  not  have  both  unbalanced  treatment  groups  and  missing  data  simulta¬ 
neously.  The  BMDP4V  program  can  generate  these  means  if  the  cell  sizes  are 
unequal,  but  none  of  the  BMDP  programs  print  useful  summary  means  unless  all 
subjects  have  complete  data. 

Standard  errors  for  the  adjusted  means  are  also  produced  by  REP2W1F. 
Morris  et  al.  [8]  have  discussed  the  shortcomings  of  SAS  GLM  and  BMDP2V  in  this 
regard.  The  fact  that  REP2W1F  is  written  to  analyze  data  from  one  particular 
design  makes  it  possible  to  compute  the  standard  errors  of  Interest  (although 
in  the  case  of  missing  data,  these  are  only  approximate). 


CONCLUSION 


We  have  described  the  method  of  analysis  incorporated  in  REP2W1F,  a  SAS 
procedure  for  the  univariate  analysis  of  a  repeated  measurements  experiment 
where  the  subjects  are  arranged  in  a  two-way  classification  with  treatment 
groups  that  may  be  unequal  in  size.  The  program  assumes  a  E-restricted  model 
[12]  to  produce  analysis  of  variance  F-tests  and  least-squares  means  that  are 
adjusted  for  incomplete  data.  The  computer  resources  required  by  REP2W1F  to 
obtain  least-squares  means  when  we  have  missing  values  do  not  escalate  nearly 
as  rapidly  as  a  function  of  the  number  of  subjects  in  the  experiment  as  SAS 
GLM.  For  example,  in  our  introduction  we  showed  a  situation  where  SAS  GLM 
required  504K  bytes  of  storage  and  5.04  minutes  of  CPU  time.  The  same  analysis 
using  REP2W1F  consumed  202K  bytes  and  0.23  minutes.  When  any  data  are  missing, 
BMDP2V  and  BMDP4V  do  not  produce  any  useful  summary  least-squares  means  at 
all.  The  REP2W1F  program  produces  standard  errors  which  can  be  used  to  place 
confidence  intervals  on,  or  test  for  differences  between,  selected 
least-squares  means.  As  pointed  out  by  Morris  et  al.  [8],  SAS  GLM  and 
BM0P2V  are  deficient  in  this  regard.  Our  studies  of  BMDP4V  indicate  that  it  is 
also  deficient.  A  generalization  of  the  algorithm  in  REP2W1F  to  situations 
where  the  number  of  treatment  factors  is  other  than  two,  or  where  the  repeated 
measures  have  a  factorial  arrangement  of  their  own,  would  be  straightforward. 
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